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Abstract 

A novel approach to parallelize the well-known Hoshen-Kopelman al- 
gorithm has been chosen, suitable for simulating huge lattices in high 
dimensions on massively-parallel computers with distributed memory and 
message passing. This method consists of domain decomposition of the 
simulated lattice into strips perpendicular to the hyperplane of investiga- 
tion that is used in the Hoshen-Kopelman algorithm. Systems of world 
record sizes, up to L — 4000256 in two dimensions, L = 20224 in three, 
and L — 1036 in four, gave precise estimates for the Fisher exponent r, 
the corrections to scaling Ai, and for the critical number density n c . 

Percolation is a thoroughly studied model in statistical physics . The al- 
gorithm invented by Hoshen and Kopelman in 1976 allows for examining large 
percolation lattices using Monte Carlo methods ||. Unfortunately, this algo- 
rithm was invented for traditional sequential computers, and implementation 
on modern parallel computers is far from trivial. 

The normal way to parallelize an algorithm which works on regular data 
structures is domain decomposition. In this case, the investigated lattice is cut 
into strips, and each processor is assigned one such strip for investigation. As the 
Hoshen-Kopelman algorithm investigates the lattice hyperplane by hyperplane 
(line by line in two dimensions; plane by plane in three dimensions), various 
ways of domain decomposition can be characterized by this hyperplane: for 
example, strips parallel or perpendicular to that plane. 

The easiest way would be to choose parallel strips, as in that case the sim- 
ulation of each strip can be carried out locally on each processor, only after 
that communication is neccessary (exchanging the borders). Unfortunately, this 
means that each processor has to store a full hyperplane of L d ~ x sites for a L d 
system. In two dimensions, this no problem at all, but in higher dimensions this 
limits the size of systems that can be simulated. 

When decomposing the system in perpendicular strips, each processor has 
to store only a part of the hyperplane, and thus larger system sizes can be 



simulated. Unfortunately, this advantage comes at a price: Frequent interac- 
tion between neighbouring strips is neccessary throughout the simulation, thus 
imposing a burden on runtime and complicating drastically the implementation. 

This latter approach was chosen for a masters thesis Q ; a shortened version 
will be presented here. 

1 Parallelizing Hoshen-Kopelman 

The main problem when decomposing the lattice into perpendicular strips is 
to cope with the non-regular communication patterns arising from the Hoshen- 
Kopelman algorithm. 

1.1 Local and global clusters 

Cluster that are confined within a strip and are not in touch with interfaces 
between strips are called local; clusters that extend over several strips are called 
global. 

The Hoshen-Kopelman algorithm describes clusters by labels. In the sequen- 
tial version, a label can be a root label, describing a real cluster, or a non-root 
label pointing directly or indirectly to a root label (such labels are generated 
when during the counting a new cluster seems to emerge, but later is discovered 
to be part of another cluster). 

For the parallel version, it is straightfoward to introduce three types of labels: 
non-root labels, root labels associated with local clusters (local root labels), and 
root labels associated with parts of global clusters (global root labels). 

Associated with local root labels are the number of sites within the cor- 
responding cluster; global root labels carry the number of sites in the cluster 
within the strip, and additionally pointers to the left and right neighbour labels. 

During the simulation, a local label can be changed to global, when it extends 
to the interfaces of the strip. A global label can be changed to local during the 
recycling process, when neighbouring labels are discarded. 

When two previously different global clusters join at one site, the neigh- 
bouring strips have to be informed of that fact, in order to join parts of the 
corresponding cluster into one. This is called pairing, as pairs of labels are 
joined. To reduce the number of messages that have to be sent, we gather these 
and exchange information after the local part. 

1.2 Information exchange 

After the local part within each strip has been simulated, communication be- 
tween the processors takes place to find out if clusters extend over the interfaces, 
and what global labels are interconnected with each other. 
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1.3 Recycling 



An important step for economic memory usage is the frequent recycling of non- 
root labels, as these carry no information, but are just an artefact of the al- 
gorithm that speeds up computation. Additionally, also root labels can be 
recycled, when they are not present in the current hyperplane of investigation. 
These methods are known as Nakanishi recycling jl(| . 

In the parallel implementation, we can use the same technique for purely 
local labels; on the other hand, we have to be careful not to recycle labels that 
are still referenced by global labels in other strips. We do that by exchanging 
information about these references; during this process, references to non-root 
labels can be reclassified. In that way, we can delete all non-root labels, even 
those pointing to global root labels. 

Global root labels are recycled using a stepwise reduction of global clusters 
along the strip: the rightmost strip that carries a part of a global cluster checks 
if it is still alive; if not, it is recycled and the left neighbour is informed that its 
right partner has vanished. If the left neighbour has itself no left neighbour, it 
is converted to a local label. 

1.4 Step- by-step description of algorithm 

The following list is a semi-formal description of the algorithm. Local and 
communication part are repeated for each hyperplane the system consists of, 
recycling is done whenever neccessary after the local and communication part, 
and counting is done after the full system was examined. 

1. Initialization: Occupy the zeroth plane for busbar, if desired; initialize all 
data structures; etc. 

2. Local: 

(a) Examine the strip site by site. Do labeling. 

(b) When two different global clusters join at one site, generate pairing 
information for left and right neighbour, but defer communication 
until after the local part. 

3. Communication: 

(a) Exchange borders of strip with neighbours. 

(b) When two clusters of both strips join, convert clusters to global. 
If they are already global, but not yet connected, generate pairing 
information. 

(c) Exchange pairing information. Pair global labels that belong to- 
gether. During this, new pairing information can come up. 

(d) Check if recycling is neccessary due to tight memory conditions. 

4. Recycling (if neccessary): 
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(a) Reclassify the current hyperplane with root labels. 

(b) Delete all non-root labels that point to local root labels. 

(c) Reclassify the pointers to left and right of the global root labels by 
asking the neighbours for the corresponding root labels. 

(d) Delete all remaining non-root labels. 

(e) Mark all living local root labels and delete the non-marked ones. 

(f) Look for all global root labels that are not present in the current 
hyperplane and have no right neighbour; delete them and send the 
number of sites to the left neighbour. 

(g) When a global label is informed that its right neighbour was deleted, 
and it has no left neighbour, convert it to local. 

5. Counting: 

(a) Count local clusters. 

(b) Concentrate global clusters. 

(c) Count global clusters. 

(d) Look for a global cluster which has not been concentrated. If it exists, 
we have connectivity. Sum up this cluster explicitly. 

(e) Do output. 

2 Results of Monte Carlo Studies 

All simulations were done right at the critical threshold p c (except those for 
Fig. 2) and on square (2d), simple cubic (3d), and simple hypercubic (4d) lat- 
tices. The values for p c were taken from literature: 

2d Pc = 0.5927464 cf. || 

3d Pc = 0.311608 cf. [7| 

4d p c = 0.196889 cf. || 
As boundary conditions, one direction was busbar in two dimensions resp. 
open in higher dimensions, one direction was periodic, and for more than two 
dimensions all other directions were helical. This mix came quite natural during 
implementing the parallel algorithm. 

2.1 Cluster size distribution 

For the number n s of clusters of size s in our system, we expect a distribution 
of kind n s oa s~ T right at the critical threshold p c - To make analysis easier, we 
look at N s , the number of clusters with at least s sites: 




(1) 



s' =S 
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In a double-logarithmic plot, we would expect a straight line with slope — r + 
1. Deviations from the power law would not be visible due to the logarithmic 
scale, thus we plot s T ~ 1 N s linearly on the y-axis. We can clearly see in Fig. 2.1 
the deviation for small s (due to corrections to scaling, see below) and for large 
s (due to boundary conditions). 
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Figure 1: Cluster size distribution for L = 4000256 in two dimensions. The 
dashed line corresponds to the theoretically asymptotic behaviour. 



The values found for the proportionality constant ko (normalized by number 
of lattice sites) and the exponent r are: 
2d k = 0.032627(6) r = 187/910 
3d k = 0.057423(3) t = 2.190(2) 
4d fc = 0.0611(4) t = 2.313(2) 
Below p c , we e xpect the size of the largest cluster to be proportional to 
log(L), cf. Fig. P 



2.2 Corrections to scaling 

For small s, the clusters do not follow the power-law n s oc s~ T ; the system 
is not self-similar under renormalization, as the lattice spacing is an inherent 
length, very visible for small clusters. Thus our power-law is modified: n s = 
kgs~ T (l — kis~ Al ). We find the exponent Ai (and the factor fci) by plotting 
log( J ?V s /(/cos~ T+1 ) — 1) against logs; for carefully adjusted ko and r, this yields 
a straight line with slope Ai and intercept k\. 

We get a straight line only for small and intermediate s. For large s, the 
finite-size effects increase the n s over the expected value and limit the precision, 

*the value for r is supposed to be known exactly 
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Figure 2: Average size of the largest cluster versus system size L at p — 0.5 in 
two dimensions. The error bars represent the statistical error. Number of runs 
done for each L: eight for L < 200 k, four for 200 k < L < 1 M, two for L > 1 M. 



cf. Fig. 2.2. Implementing fully periodic boundary conditions could give an 



improvement in precision. 

The values found for the corrections to scaling amplitude and exponent are: 
2d hi = 0.50(2) Ai = 0.70(2) 
M h = 0.48(5) Ai = 0.60(8) 
Ad h= 0.5(1) Ai = 0.5(1) 



2.3 Number density 

The total number of clusters divided by the number of sites in the lattice is called 
the number density, right at p c the critical number density. For small systems, 
we expect finite-size corrections proportional to 1/L. Statistical fluctuations 
depend on the total number of sites in the lattice and are proportional to l/L d . 
Thus finite-size effects can be seen well in higher dimensions. 

For two dimensions, we can simulate large L and have no visible finite- 
size effects. But we can clearly see a dependence on the used random number 
generator. The classic and simple ibm = ibm * 16807 produces wrong results, 
whereas others agree well with each other (i. e. R(103,250), R(471, 1586,6988,9689), 
R(18,36,37, 71, 89, 124), 11 even ibm = ibm * 13**13 seems to be acceptable). 
Thus the number density can serve as a test for bad random numbers, cf. Fig. |2.3[ 

The values found for the number density are: 
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Figure 3: Corrections to scaling in two dimensions, for L = 500k (+) and 
L = 4M ( x ) . The larger system offers higher precision, as the finite-size effects 
are weaker. 
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Figure 4: Number densities at p c for various system sizes L in two dimensions, 
using different PRNGs: R(103,250) (+), R(471, 1586,6988,9689) (x), ibm*16807 
(boxes), R(18,36,37,71,89,124) (circles), andibm*13 13 (triangles). The solid line 
corresponds to the average for the L = 1M runs of the two-tap, four-tap, and 
six-tap R(. . . ) generators, the dashed lines to the statistical error. 
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2d n c = 0.02759791(5) 
3d n c = 0.0524387(3) 
Ad n c = 0.0519980(2) 



3 Efficiency 

When implementing an algorithm on a parallel computer, we want to know 
how efficient our implementation is, i. e. if not too much time is wasted for 
communication . 

The amount of neccessary communication is roughly proportional to the 
interface between the strips, i. c. NL d_1 . This communication is the main 
difference between the sequential and parallel implementation, and reduces the 
parallel efficiency. 

In two dimensions, a sequential implementation is roughly as fast as the 
parallel one, which means that the effort for communication is neglectable. In 
four dimensions, the fraction of runtime needed for communication is significant, 
but the time for the local part is still larger. 

Although the domain decomposition in perpendicular strips requires more 
communication than other methods, it is still efficient enough for up to four 
dimensions. For even higher dimensions, this may change. 

4 Summary 

Parallelizing the Hoshen-Kopelman by dividing the lattice into strips perpendic- 
ular to the hyperplane of investigation is a viable approach. Using this method, 
world record simulations in two, three, and four dimensions have been carried 
out, beating the old world records. 4 ' 5 High precision results for the Fisher ex- 
ponent r, the corrections to scaling exponent Ai, and the number density n c 
have been obtained. The values found are in reasonable agreement with those 
from other groups. 4 ' 5,9 
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